Improving Evaluation of Document-level Machine Translation Quality Estimation

نویسندگان

  • Timothy Baldwin
  • Qun Liu
  • Yvette Graham
  • Carolina Scarton
  • Qingsong Ma
  • Carla Parra Escartín
چکیده

Meaningful conclusions about the relative performance of NLP systems are only possible if the gold standard employed in a given evaluation is both valid and reliable. In this paper, we explore the validity of human annotations currently employed in the evaluation of document-level quality estimation for machine translation (MT). We demonstrate the degree to which MT system rankings are dependent on weights employed in the construction of the gold standard, before proposing direct human assessment as a valid alternative. Experiments show direct assessment (DA) scores for documents to be highly reliable, achieving a correlation of above 0.9 in a self-replication experiment, in addition to a substantial estimated cost reduction through quality controlled crowdsourcing. The original gold standard based on post-edits incurs a 10–20 times greater cost than DA.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Combining Quality Prediction and System Selection for Improved Automatic Translation Output

This paper presents techniques for referencefree, automatic prediction of Machine Translation output quality at both sentenceand document-level. In addition to helping with document-level quality estimation, sentencelevel predictions are used for system selection, improving the quality of the output translations. We present three system selection techniques and perform evaluations that quantify...

متن کامل

Searching for Context: a Study on Document-Level Labels for Translation Quality Estimation

In this paper we analyse the use of popular automatic machine translation evaluation metrics to provide labels for quality estimation at document and paragraph levels. We highlight crucial limitations of such metrics for this task, mainly the fact that they disregard the discourse structure of the texts. To better understand these limitations, we designed experiments with human annotators and p...

متن کامل

Automatic Detection of Machine Translated Text and Translation Quality Estimation

We show that it is possible to automatically detect machine translated text at sentence level from monolingual corpora, using text classification methods. We show further that the accuracy with which a learned classifier can detect text as machine translated is strongly correlated with the translation quality of the machine translation system that generated it. Finally, we offer a generic machi...

متن کامل

The Correlation of Machine Translation Evaluation Metrics with Human Judgement on Persian Language

Machine Translation Evaluation Metrics (MTEMs) are the central core of Machine Translation (MT) engines as they are developed based on frequent evaluation. Although MTEMs are widespread today, their validity and quality for many languages is still under question. The aim of this research study was to examine the validity and assess the quality of MTEMs from Lexical Similarity set on machine tra...

متن کامل

Document-Level Machine Translation Evaluation with Gist Consistency and Text Cohesion

Current Statistical Machine Translation (SMT) is significantly affected by Machine Translation (MT) evaluation metric. Nowadays the emergence of document-level MT research increases the demand for corresponding evaluation metric. This paper proposes two superior yet low-cost quantitative objective methods to enhance traditional MT metric by modeling document-level phenomena from the perspective...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017